RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md) by potiuk · Pull Request #677 · apache/creadur-rat

potiuk · 2026-06-10T17:38:14Z

What

Adds a threat model for Apache Creadur (RAT) at the Creadur PMC's request (GLASSWING / Mythos scan pre-flight):

THREAT_MODEL.md — the model (rubric).
SECURITY.md + AGENTS.md — disclosure pointer + the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md chain.

The model in brief

RAT is modelled as an in-process build/CLI license-audit tool — not a network service, and explicitly not a security/vulnerability scanner. Its security-relevant case is auditing untrusted input: the XML configuration (XXE surface) and archive descent (decompression-bomb surface). Findings that require RAT to process input the operator already trusts (the normal case — your own source tree) are out of model.

DRAFT — you own it; two quick technical confirmations

Because RAT is small, the §8-vs-§9 split hinges on two facts I've left as section 14 questions:

Q3 — does XMLConfigurationReader disable DOCTYPE/external entities (XXE-safe)?
Q4 — does ArchiveWalker bound decompression (size/depth/entry-count)?

Your answers turn those from "open question" into either a provided property (§8) or a documented gap + downstream note (§9). Also Q6: want me to add the same chain to creadur-whisker and creadur-tentacles so all three are discoverable?

Generated by the ASF Security team's threat-model tooling (Claude Opus); reviewed before opening.

Rebased onto current master, which already added AGENTS.md and SECURITY.md. Keeps both maintainer files and adds the detailed THREAT_MODEL.md plus the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointers. Generated-by: Claude Opus 4.8 (1M context)

Claudenw · 2026-06-15T14:40:44Z

This PR looks like it needs answers from developers before submitting.
@potiuk is that the expected flow? We answer questions and push the resulting file(s) to the repo?

potiuk · 2026-06-15T22:02:58Z

Yes. Absolutely. - it's enough we just comment in the PR answering the questions and I will update the PR accordingly

Claudenw · 2026-06-16T16:03:46Z

@potiuk

In answer to the first question, there is a PR to ensure we have this covered.
In answerto the second question, we utilize Apache Common Compress ArchiveStreamFactory class to create archive streams. If the filters indicate that we should process the archived file we extract the contents into an in-memory buffer where it is held until the document is processed. So we could hit OOM.

Incorporates the Creadur PMC's PR apache#677 review: - archive walker confirmed unbounded (in-memory extraction) -> §9 gap + §10 - XML/DOCTYPE hardening noted as in-flight PMC PR (§14 Q3, link pending) - documents RAT write mode (--addLicense) as trusted-input / out-of-model - notes CLI/Ant/Maven front-ends are generated from a common core - §15 corrected: SECURITY.md already exists (added via apache#671) Generated-by: Claude Opus 4.8

potiuk · 2026-06-17T01:28:18Z

Thanks Claudenw and ottlinger — folded your review into the draft and pushed (THREAT_MODEL.md, +53/-21):

Archive handling (your Q on OOM): confirmed as a gap. The unbounded in-memory extraction via Commons Compress ArchiveStreamFactory is now §9 (not a provided property), with the matching §10 downstream responsibility (sandbox / resource-limit RAT runs over untrusted archives) and a VALID-HARDENING disposition. Marked maintainer-confirmed.
XML/DOCTYPE (your "there is a PR to ensure we have this covered"): I read that as the XXE-hardening question (§14 Q3). I've noted a hardening PR is in flight but left §8 Gradle RAT: Add plainOutput, xmlOutput and htmlOutput options #2 tentative pending the link — could you drop the PR number here so I can cite it? Once it lands I'll flip XXE to a provided property.
ottlinger — write mode: good catch. RAT's --addLicense / editor mode writes headers into the audited tree; I've documented it explicitly in §2/§3 as operator-invoked on the operator's own (trusted) sources → it's OUT-OF-MODEL: trusted-input, but now stated rather than silent.
ottlinger — generated front-ends: added a §2 note that CLI/Ant/Maven are generated from a common option core, so a security property (or gap) in the core transfers to all three UIs.
ottlinger — SECURITY.md / RAT-558: Add proposals for AGENTS and SECURITY configuration - to be discussed #671: thanks, fixed the stale §15 line — it now reflects that RAT-558: Add proposals for AGENTS and SECURITY configuration - to be discussed #671 added SECURITY.md and this PR just appends the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md pointer.

Still open if you have a moment (one line each is plenty): Q1 (confirm the untrusted-input case is the one to model), Q2 (RAT makes no network connections), Q5 (Whisker/Tentacles share the profile), and Q6 (want us to add the same pointer files to creadur-whisker/-tentacles, or will you?).

One note on CI: the failing "Build and analyze" (CodeQL) check is unrelated to this PR — it's a docs-only change (three .md files), so it isn't introducing or affected by that build job; looks pre-existing/flaky on the branch.

Claudenw · 2026-06-17T07:02:37Z

@potiuk There is one more point that has not been discussed. RAT allows developers to extend the matching algorithms. See https://creadur.apache.org/rat/license_def.html#Matchers

The upshot is that 3rd parties can create new matchers and use them in license checks. Matchers are different from license checks in that license checks use matchers. For example the Apache 2.0 license check uses the text, spdx and any matchers.

Matchers scan the contents of the file (as a String) looking for matches. This means that a custom matcher would have access to all text from all files that are selected for scanning.

But this is defined in the configuration and is under control of the developer using RAT.

ottlinger · 2026-06-17T12:37:47Z

@potiuk the Sonarbuild does only run on specific branches/with specific PRs as the credentials are not shared among all PRs/builds due to ASF restrictions.

Claudenw · 2026-06-17T16:16:44Z

could you drop the PR number here so I can cite it? Once it lands I'll flip XXE to a provided property.

#679 is the PR that does the XXE hardening.

Per Claudenw (PR apache#677): RAT lets operators define custom matcher classes that see all scanned file text, but the matcher set is operator-defined config (not attacker-supplied), so it's OUT-OF-MODEL: trusted-input — same posture as the write mode. Generated-by: Claude Opus 4.8 (1M context)

potiuk · 2026-06-18T22:40:20Z

Thanks Claudenw — folded the custom-matcher surface into §3: RAT lets operators define custom matcher classes that see the full text of every scanned file, but since the matcher set is operator-defined config (not attacker-supplied), a custom matcher reading scanned text is OUT-OF-MODEL: trusted-input — the same posture as the write mode. Marked maintainer-confirmed.

ottlinger — thanks for the CI clarification; that matches what we expected (the Sonar/CodeQL job is credential-gated to specific branches under ASF restrictions, so its red on this fork PR is environmental, not anything this docs-only change introduces).

Still one open item: the XXE-hardening PR number (§14 Q3) — I've left §8 #2 tentative pending it. Whenever you drop the number I'll cite it and flip XXE from "hardening in flight" to a provided property. No rush.

The remaining §14 questions (Q1 untrusted-input posture, Q2 no-network, Q5 Whisker/Tentacles profile, Q6 sibling pointer files) are still open whenever convenient — one line each is plenty.

ottlinger · 2026-06-19T09:27:24Z

@potiuk - thanks again:

Q5:as development on Tentacles/Whisker is rather low at the moment I'd personally prefer to start with RAT and add Tentacles/Whisker later in order to reduce noise.
Q2: AFAIK RAT does not open network connections as it runs locally; except for downloading stuff via Maven/Ant/buildtools in order to build the artifacts locally.

…d (external entities disabled, apache#679 hardens DOCTYPE), no-network confirmed (XSLT xsl:include caveat), correct archive path-handling (read to memory, no extract-to-disk → no path traversal), Whisker/Tentacles deferred

potiuk · 2026-06-21T19:20:48Z

Thanks @ottlinger and @Claudenw — that's everything we needed. I've folded all your answers into THREAT_MODEL.md (pushed). Summary of what changed:

Trust posture (Q1) — confirmed and now (maintainer): RAT config (XSLT, config files, license defs, custom matchers) is operator-trusted; the scanned files are the untrusted surface. §2/§7.
No network (Q2) — confirmed (maintainer). Added your XSLT xsl:include nuance: the one operator-reachable way out is a remote xsl:include, and since stylesheets are trusted config that's OUT-OF-MODEL. §5/§8 Upgrade Apache Commons Collections to v3.2.2 #3.
XXE (Q3) — external entities are disabled, so §8 Gradle RAT: Add plainOutput, xmlOutput and htmlOutput options #2 is now a provided property (was tentative); noted PR RAT-560: changes to reduce XXE exposure #679 as the DOCTYPE-hardening follow-up, and the JAXP-system-properties configurability. §5/§5a/§8.
Archive bound (Q4) — kept as a disclaimed §9 gap (no bound, OOM not guarded).
Path handling — corrected a phantom risk: since RAT reads entries into memory and never extracts to disk, there's no zip-slip / path-traversal-on-write surface. An entry label like bar/baz.zip#/junk.txt is just a report string. §6/§9.
Whisker/Tentacles (Q5/Q6) — scoped this PR to creadur-rat per your preference; the sibling pointer files are a deferred follow-up.

With every §14 question answered, the model is ready to ratify whenever the PMC's happy with it. (The red check is the CodeQL "Build and analyze" job, which is unrelated to these doc-only changes — all 13 build/test matrix jobs pass.)

ottlinger · 2026-06-26T17:19:01Z

@potiuk if we merge this PR we are done and trigger the next step or are there any other things that need to be done on our side? Thanks

potiuk · 2026-06-26T19:39:15Z

@ottlinger @Claudenw — quick note on the red check: that's only the SonarCloud job, which can't run on PRs because the ASF doesn't share its Sonar credentials with PR builds (as you noted). It's not an actual test failure, so it's safe to merge past it. And yes — merging this is the last step on your side; once it lands I'll verify the AGENTS.md → SECURITY.md → THREAT_MODEL.md chain resolves and Creadur/RAT enters the queue. (Starting with RAT and adding Tentacles/Whisker later is fine.)

ottlinger · 2026-06-26T21:24:31Z

@Claudenw anything to add from your side? I'm fine with merging in order to go forward. Thx

ottlinger

LGTM as a first draft

Claudenw

I will have to spend more time thinking about the threat model. I think with license headers, and the XXE questions resolved we can proceed.

- THREAT_MODEL.md: use https for the license URL; clarify that --addLicense write mode inserts text from the operator-controlled license definition, not from the scanned tree. - AGENTS.md, SECURITY.md: add the ASF license header. - SECURITY.md: replace the TBD intro with a short security-posture summary and a "Reporting a Vulnerability" section. Generated-by: Claude Opus 4.8 (1M context)

potiuk · 2026-06-27T07:31:42Z

Thanks for the careful review, @Claudenw and @ottlinger — really appreciate it. All six threads are addressed in a6978b8 and resolved:

✅ https for the license URL
✅ ASF license headers added to AGENTS.md and SECURITY.md
✅ SECURITY.md TBD → short security-posture summary + a Reporting a Vulnerability section (points at security@apache.org; detailed model stays in THREAT_MODEL.md)
✅ §3 write-mode note now spells out that --addLicense inserts text from the operator-controlled license definition, not the scanned tree (thanks Claude for the "controlled input file" framing)
✅ the Maven/Ant/CLI "generated front-ends" point was already captured in the §2 note — pointed there

Should be good to go now — shout if anything else stands out. Thanks again! 🙏

potiuk · 2026-06-27T07:38:37Z

BTW. @Claudenw Usually (in order to save tokens) we either use no licence or we use short SPDX form of it - and usually exclude the AGENTS.md/SECURITY.md from release - to be 100% in line with expectation of full licences in released source packages.

We had a lengthy discussion about it - without complete conclusion, but the discussion seemed to converge on "If you are not releasing the .md files, it's fine to have no licence or SPDX or some other short version of it. This is generally OK - according to https://www.apache.org/legal/src-headers.html#is-a-short-form-of-the-source-header-available - where one such example short form is allowed in justified cases, while SPDX seems to be another good form. and we usually exclude those files in .rat excludes and ignore on source releases.

Long thread about it (with my assumptions summarized) can be found here - https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53

If you wish - we can use either approach.

Greeting from NY Claude :)

…e labels Answers Claudenw's review note (does apache#679 impact the XXE data-flow line?): the §5a/§8 text already records that RAT disables external entities + the apache#679 DOCTYPE hardening, but the data-flow diagram and the input/residual tables still labelled XXE a bare "surface". Annotate those three labels with the mitigation so the diagram is consistent with §5a/§8 apache#2. Generated-by: Claude Opus 4.8 (1M context)

Consistency with THREAT_MODEL.md (§5a / §8 apache#2): since RAT-560 (apache#679) RAT builds XML parsers via the hardened StandardXmlFactory (DOCTYPE + external entities disabled), so XXE is actively prevented. Lead with that; keep the operator-trusted-config argument as defense-in-depth. Generated-by: Claude Opus 4.8 (1M context)

ottlinger changed the title ~~Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md)~~ RAT-558: Add security threat model (THREAT_MODEL.md + SECURITY.md + AGENTS.md) Jun 11, 2026

potiuk force-pushed the asf-security/threat-model-2026-06-10 branch from 35879b0 to d4f0fdd Compare June 14, 2026 01:19

ottlinger reviewed Jun 16, 2026

View reviewed changes

Comment thread THREAT_MODEL.md

ottlinger reviewed Jun 16, 2026

View reviewed changes

Comment thread THREAT_MODEL.md

ottlinger reviewed Jun 16, 2026

View reviewed changes

Comment thread THREAT_MODEL.md Outdated

ottlinger requested a review from Claudenw June 16, 2026 12:40

Claudenw reviewed Jun 19, 2026

View reviewed changes

ottlinger requested review from Claudenw and ottlinger June 26, 2026 21:23

ottlinger approved these changes Jun 26, 2026

View reviewed changes

Claudenw requested changes Jun 27, 2026

View reviewed changes

Comment thread THREAT_MODEL.md Outdated

Comment thread AGENTS.md Outdated

Comment thread SECURITY.md

Comment thread SECURITY.md Outdated

Comment thread THREAT_MODEL.md

Comment thread THREAT_MODEL.md Outdated

potiuk added 2 commits June 27, 2026 03:45

Uh oh!

Conversation

potiuk commented Jun 10, 2026

What

The model in brief

DRAFT — you own it; two quick technical confirmations

Uh oh!

Claudenw commented Jun 15, 2026

Uh oh!

potiuk commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Claudenw commented Jun 16, 2026

Uh oh!

potiuk commented Jun 17, 2026

Uh oh!

Claudenw commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ottlinger commented Jun 17, 2026

Uh oh!

Claudenw commented Jun 17, 2026

Uh oh!

potiuk commented Jun 18, 2026

Uh oh!

ottlinger commented Jun 19, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

potiuk commented Jun 21, 2026

Uh oh!

ottlinger commented Jun 26, 2026

Uh oh!

potiuk commented Jun 26, 2026

Uh oh!

ottlinger commented Jun 26, 2026

Uh oh!

ottlinger left a comment

Choose a reason for hiding this comment

Uh oh!

Claudenw left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

potiuk commented Jun 27, 2026

Uh oh!

potiuk commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

potiuk commented Jun 15, 2026 •

edited

Loading

Claudenw commented Jun 17, 2026 •

edited

Loading

potiuk commented Jun 27, 2026 •

edited

Loading